Analysis of Hadoop’s Performance under Failures
نویسندگان
چکیده
Failures are common in today’s data center environment and can significantly impact the performance of important jobs running on top of large scale computing frameworks. In this paper we analyze Hadoop’s behavior under compute node and process failures. Surprisingly, we find that even a single failure can have a large detrimental effect on job running times. We uncover several important design decisions underlying this distressing behavior: the inefficiency of Hadoop’s statistical speculative execution algorithm, the lack of sharing failure information and the overloading of TCP failure semantics. We hope that our study will add new dimensions to the pursuit of robust large scale computing framework designs.
منابع مشابه
Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop
Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland dataflow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop jo...
متن کاملRobust Control of a Quadrotor in the Presence of Actuators' Failure
Today, robots and unmanned aerial vehicles are being used extensively in modern societies. Due to a wide range of applications, it has attracted much attention among scientists over the past decades. This paper deals with the problem of the stability of a four-rotor flying robot called quadrotor, which is an under-actuated system, in the presence of operator or sensor failures. The dynamica...
متن کاملHadoop’s Overload Tolerant Design Exacerbates Failure Detection and Recovery∗
Data processing frameworks like Hadoop need to efficiently address failures, which are common occurrences in today’s large-scale data center environments. Failures have a detrimental effect on the interactions between the framework’s processes. Unfortunately, certain adverse but temporary conditions such as network or machine overload can have a similar effect. Treating this effect oblivious to...
متن کاملAnalysis of an M/G/1 Queue with Multiple Vacations, N-policy, Unreliable Service Station and Repair Facility Failures
This paper studies an M/G/1 repairable queueing system with multiple vacations and N-policy, in which the service station is subject to occasional random breakdowns. When the service station breaks down, it is repaired by a repair facility. Moreover, the repair facility may fail during the repair period of the service station. The failed repair facility resumes repair after completion of its re...
متن کامل1Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop
Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland dataflow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop jo...
متن کامل